Simultaneous structural variation discovery among multiple paired-end sequenced genomes.

نویسندگان

  • Fereydoun Hormozdiari
  • Iman Hajirasouliha
  • Andrew McPherson
  • Evan E Eichler
  • S Cenk Sahinalp
چکیده

With the increasing popularity of whole-genome shotgun sequencing (WGSS) via high-throughput sequencing technologies, it is becoming highly desirable to perform comparative studies involving multiple individuals (from a specific population, race, or a group sharing a particular phenotype). The conventional approach for a comparative genome variation study involves two key steps: (1) each paired-end high-throughput sequenced genome is compared with a reference genome and its (structural) differences are identified; (2) the lists of structural variants in each genome are compared against each other. In this study we propose to move away from this two-step approach to a novel one in which all genomes are compared with the reference genome simultaneously for obtaining much higher accuracy in structural variation detection. For this purpose, we introduce the maximum parsimony-based simultaneous structural variation discovery problem for a set of high-throughput sequenced genomes and provide efficient algorithms to solve it. We compare the proposed framework with the conventional framework, on the genomes of the Yoruban mother-father-child trio, as well as the CEU trio of European ancestry (both sequenced by Illumina platforms). We observed that the conventional framework predicts an unexpectedly high number of de novo variations in the child in comparison to the parents and misses some of the known variations. Our proposed framework, on the other hand, not only significantly reduces the number of incorrectly predicted de novo variations but also predicts more of the known (true) variations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simultaneous Structural Variation Discovery in Multiple Paired-End Sequenced Genomes

As whole genome shotgun sequencing (WGSS) becomes more accessible using high-throughput sequencing technologies, undertaking comparative studies among different individuals (based on population, race, or genetic disease) is the next logical step. In this paper, we propose a paradigm shift in variation comparative studies (specifically structural variation) away from the conventional two step ap...

متن کامل

Combinatorial Algorithms for Structural Variation Detection in High Throughput Sequenced Genomes

Recent studies show that along with single nucleotide polymorphisms and small indels, larger structural variants among human individuals are common. The Human Genome Structural Variation Project aims to identify and classify deletions, insertions, and inversions (>5 Kbp) in a small number of normal individuals with a fosmid-based paired-end sequencing approach using traditional sequencing techn...

متن کامل

inGAP-sv: a novel scheme to identify and visualize structural variation from paired end mapping data

Mining genetic variation from personal genomes is a crucial step towards investigating the relationship between genotype and phenotype. However, compared to the detection of SNPs and small indels, characterizing large and particularly complex structural variation is much more difficult and less intuitive. In this article, we present a new scheme (inGAP-sv) to detect and visualize structural var...

متن کامل

Solving Generalized FLSA with ADMM Algorithm for Copy Number Variation Detection in Human Genomes

Structural variations (SVs) account for most of the bases that vary among human genomes [3] and are believed to contribute significantly to variation between individuals, possibly as large of an effect as Single Nucleotide Polymorphisms (SNPs) [9, 6]. Although some types of SV (such as copy number variation (CNV)) have cost-effective methods available for their discovery (SNP and CGH arrays [3]...

متن کامل

Reprever: resolving low-copy duplicated sequences using template driven assembly

Genomic sequence duplication is an important mechanism for genome evolution, often resulting in large sequence variations with implications for disease progression. Although paired-end sequencing technologies are commonly used for structural variation discovery, the discovery of novel duplicated sequences remains an unmet challenge. We analyze duplicons starting from identified high-copy number...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome research

دوره 21 12  شماره 

صفحات  -

تاریخ انتشار 2011